<?xml version="1.0" encoding="UTF-8"?>
<rss version='2.0' xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Paul Houle</title>
    <description>Creator of database animals and bayesian brains</description>
    <link>https://database_animals.silvrback.com/feed</link>
    <atom:link href="https://database_animals.silvrback.com/feed" rel="self" type="application/rss+xml"/>
    <category domain="database_animals.silvrback.com">Content Management/Blog</category>
    <language>en-us</language>
      <pubDate>Wed, 17 Sep 2014 06:36:09 -1100</pubDate>
    <managingEditor>paul@ontology2.com (Paul Houle)</managingEditor>
      <item>
        <guid>http://blog.databaseanimals.com/the-unofficial-basekb-browsing-interface#8224</guid>
          <pubDate>Wed, 17 Sep 2014 06:36:09 -1100</pubDate>
        <link>http://blog.databaseanimals.com/the-unofficial-basekb-browsing-interface</link>
        <title>The Unofficial :BaseKB Browsing Interface</title>
        <description>(It pays to look at your Twitter stream)</description>
        <content:encoded><![CDATA[<p><img alt="Kingsley Idehen" class="sb_float" src="https://silvrback.s3.amazonaws.com/uploads/25b234b0-8e12-437e-a397-251d388b6c19/565737618_29c4deeb77_medium.jpg" /></p>

<p>I was launching <a href="https://legalentityidentifier.info/lei/get/549300K1JD1LDRNBET48">yet another site</a> and I was looking for a place to (visually) link :BaseKB identifiers to.  I could have linked them to Freebase,  but I wanted something that was RDF native.</p>

<p>Well,  I  did some searching on my <a href="https://twitter.com/paul_houle">twitter feed</a> and found an tweet from the irrepressible <a href="https://twitter.com/kidehen">Kingsley Idehen</a> and discovered that he&#39;d built a browsing interface for :BaseKB a long time ago,  which is built into the Linked Data Explorer.</p>

<p>It&#39;s pretty good:  here&#39;s the page for <a href="https://legalentityidentifier.info/lei/get/787RXPR0UX0O0XUXPZ81">Nike</a> on my site,  which links in turn to the <a href="http://lod.openlinksw.com/describe/?uri=http%3A%2F%2Frdf.basekb.com%2Fns%2Fm.0lwkh">Nike</a> page in the LOD browser.</p>

<h1 id="unofficial-linking-and-the-unofficial-rdf-endpoint">Unofficial Linking and the Unofficial RDF endpoint</h1>

<p><img alt="Silvrback blog image" src="https://silvrback.s3.amazonaws.com/uploads/7f5df644-033d-4e45-b352-5ef0526242b8/Capture_large.PNG" /></p>

<p>For a long time I&#39;ve heard from from people who&#39;d like to see a  dereferencing endpoint on <code>rdf.basekb.com</code>.  I&#39;ve got some use for such a thing if I can get the latency really low,  but been something I&#39;ve needed,  so it hasn&#39;t happened yet.</p>

<p>To make at least some sense of my URIs,  I am now redirecting</p>
<div class="highlight"><pre><span></span>http://rdf.basekb.com/ns/{X}
</pre></div>
<p>to</p>
<div class="highlight"><pre><span></span>http://lod.openlinksw.com/describe/?uri=http%3A%2F%2Frdf.basekb.com%2Fns%2F{X}
</pre></div>
<p>which will at least let me see usage statistics and let you do something interesting with the URIs.  Feel free also to use the link above if you want to show people a human-readable description of something in the :BaseKB namespace.</p>
]]></content:encoded>
      </item>
      <item>
        <guid>http://blog.databaseanimals.com/identifiers-in-freebase#6428</guid>
          <pubDate>Tue, 16 Sep 2014 13:17:30 -1100</pubDate>
        <link>http://blog.databaseanimals.com/identifiers-in-freebase</link>
        <title>Identifiers in Freebase</title>
        <description>... how they look from RDF</description>
        <content:encoded><![CDATA[<h1 id="prelude">Prelude</h1>

<p>Most Linked Data sets represent links to the outside world in a format like</p>
<div class="highlight"><pre><span></span>:internalResource owl:sameAs external:thatResource .
</pre></div>
<p>where <code>owl:sameAs</code> could be replaced by some other predicate which is not so problematic in its definition.  When data is linked so,  you have many options for integration,  such as loading everything into the same triple store,  or derferencing URIs one at a time.</p>

<p><img alt="Garret A. Morgan" class="sb_float" src="https://silvrback.s3.amazonaws.com/uploads/ce9cae0e-8a02-4c3e-bebf-547a41de0dab/garrettmorgan903_medium.jpg" /></p>

<p>Freebase was conceived before the time of Linked Data and SPARQL so it developed its own method of mapping identifiers to concepts;  this information is expressed in two different ways in RDF.</p>

<p>For the purpose of concision,  the <a href="https://aws.amazon.com/marketplace/pp/B00KDO5IFA/">:BaseKB Compact Edition</a> supports only one of these mechanisms,  the <code>:type.object.key</code> predicate,  while the <a href="https://aws.amazon.com/marketplace/pp/B00KRKRYW0">:BaseKB Complete Edition</a> supports both.</p>

<p>This article teaches you how to look up external and identifiers using the <code>:type.object.key</code> predicate and the special <code>key:</code> namespace.</p>

<h1 id="inventor-of-the-traffic-light">Inventor of the Traffic Light</h1>

<p>Let&#39;s take case of Garret Morgan,  who is <a href="https://www.freebase.com/m/01tp2v">:m.01tp2v</a> in Freebase and who comes about as close to a <a href="http://en.wikipedia.org/wiki/Garrett_Morgan#Safety_hood">real-life Tony Stark</a> as anyone.  If we look up identifiers that Freebase knows for him with this query</p>
<div class="highlight"><pre><span></span>sparql

prefix : &lt;http://rdf.basekb.com/ns/&gt; 

select ?key {
   :m.01tp2v  :type.object.key ?key .
} ORDER BY ?key
</pre></div>
<p>we get</p>

<p><img alt="Morgan Results" src="https://silvrback.s3.amazonaws.com/uploads/899e94fd-be33-4058-b9f8-fc8dffee088a/morgan_large.PNG" /></p>

<p>Note that Freebase keys are structured like path in Unix.  <code>:type.object.key</code> spells them out completely,  while the alternative representation represents the directed acyclic graph directly.</p>

<p>Note that some of these identifiers have been inserted by external entities,  (ex. <code>/base/ranker/</code> and <code>/user/avh/ellerdale</code>),  we also see a key in the <code>/en/</code> namespace which means you can refer to this entity as <code>/en/garret_a_morgan</code> in MQL queries.  In the early days,  Freebase created human-readable identifiers for all topics,  but this policy did not scale well,  and Freebase eventually converged on the consistent use of mids for everything that is not a type or a property.</p>

<h2 id="unicode-character-encoding-in-keys">Unicode character encoding in keys</h2>

<p>An important bit of convention is that Freebase encodes non-plaintext characters in identifiers as <code>$xxxx</code> where <code>xxxx</code> is hexadecimal for a 16-bit Unicode codepoint.  You can see this used above,  where &quot;Garret A. Morgan&quot; is spelled out as</p>
<div class="highlight"><pre><span></span>Garret_A$002E_Morgan
</pre></div>
<p>heaxdecimal <code>2E</code> is decimal 46 in ASCII and Unicode,  which represents a period.  The same encoding is used for the Korean variant &quot;개릿 모건&quot;,  which is Morgan&#39;s name spelled out phonetically</p>
<div class="highlight"><pre><span></span>/wikipedia/ko/$AC1C$B9BF_$BAA8$AC74
</pre></div>
<p>Note that characters in the upper plane (with codepoints greater than <code>$FFFF</code>) are encoded as a pair of symbols using <a href="http://en.wikipedia.org/wiki/Universal_Character_Set_characters#Surrogates">surrogate characters</a>.  The following Java function decodes the <code>$</code> sequences in Freebase keys:</p>
<div class="highlight"><pre><span></span><span class="kd">public</span> <span class="n">String</span> <span class="nf">unescapeFreebaseKey</span><span class="o">(</span><span class="n">String</span> <span class="n">in</span><span class="o">)</span> <span class="o">{</span>
    <span class="n">StringBuilder</span> <span class="n">out</span><span class="o">=</span><span class="k">new</span> <span class="n">StringBuilder</span><span class="o">(</span><span class="n">in</span><span class="o">.</span><span class="na">length</span><span class="o">());</span>
    <span class="n">String</span> <span class="o">[]</span> <span class="n">parts</span><span class="o">=</span><span class="n">in</span><span class="o">.</span><span class="na">split</span><span class="o">(</span><span class="s">&quot;[$]&quot;</span><span class="o">);</span>
    <span class="n">out</span><span class="o">.</span><span class="na">append</span><span class="o">(</span><span class="n">parts</span><span class="o">[</span><span class="mi">0</span><span class="o">]);</span>
    <span class="k">for</span><span class="o">(</span><span class="kt">int</span> <span class="n">i</span><span class="o">=</span><span class="mi">1</span><span class="o">;</span><span class="n">i</span><span class="o">&lt;</span><span class="n">parts</span><span class="o">.</span><span class="na">length</span><span class="o">;</span><span class="n">i</span><span class="o">++)</span> <span class="o">{</span>
        <span class="n">String</span> <span class="n">hexSymbols</span><span class="o">=</span><span class="n">parts</span><span class="o">[</span><span class="n">i</span><span class="o">].</span><span class="na">substring</span><span class="o">(</span><span class="mi">0</span><span class="o">,</span><span class="mi">4</span><span class="o">);</span>
        <span class="n">String</span> <span class="n">remainder</span><span class="o">=</span><span class="s">&quot;&quot;</span><span class="o">;</span>
        <span class="k">if</span><span class="o">(</span><span class="n">parts</span><span class="o">[</span><span class="n">i</span><span class="o">].</span><span class="na">length</span><span class="o">()&gt;</span><span class="mi">4</span><span class="o">)</span> <span class="o">{</span>
            <span class="n">remainder</span><span class="o">=</span><span class="n">parts</span><span class="o">[</span><span class="n">i</span><span class="o">].</span><span class="na">substring</span><span class="o">(</span><span class="mi">4</span><span class="o">);</span>
        <span class="o">}</span>

        <span class="kt">int</span> <span class="n">codePoint</span><span class="o">=</span><span class="n">Integer</span><span class="o">.</span><span class="na">parseInt</span><span class="o">(</span><span class="n">hexSymbols</span><span class="o">,</span><span class="mi">16</span><span class="o">);</span>
        <span class="kt">char</span><span class="o">[]</span> <span class="n">character</span><span class="o">=</span><span class="n">Character</span><span class="o">.</span><span class="na">toChars</span><span class="o">(</span><span class="n">codePoint</span><span class="o">);</span>
        <span class="n">out</span><span class="o">.</span><span class="na">append</span><span class="o">(</span><span class="n">character</span><span class="o">);</span>
        <span class="n">out</span><span class="o">.</span><span class="na">append</span><span class="o">(</span><span class="n">remainder</span><span class="o">);</span>
    <span class="o">}</span>

    <span class="k">return</span> <span class="n">out</span><span class="o">.</span><span class="na">toString</span><span class="o">();</span>
<span class="o">}</span>
</pre></div>
<h2 id="the-key-s-to-wikipedia">The key(s) to Wikipedia</h2>

<p>Let&#39;s take a look at the conventions used in Wikipedia keys.  Wikipedia keys come in several kinds:</p>
<div class="highlight"><pre><span></span>\wikipedia\{lang}\
\wikipedia\{lang}_title\
\wikipedia\{lang}_id\
</pre></div>
<p>where <code>{lang}</code> is an <a href="http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes">ISO 639-1</a> or a variation of an ISO code.  Wikipedia keys are derived from Wikipedia titles by replacing the space character with an underscore,  and escaping punctuation and non-ASCII characters with the <code>$</code>-convention describe above.</p>

<p>A page in Wikipedia has a &quot;real&quot; title,  but may appear under different names because of redirect records that point to the real page.  The real title is encoded in the <code>\wikipedia\{lang}_title\</code> namespaces,  whereas the titles that redirect to the real title are encoded in the <code>\wikipedia\{lang}\</code> namespaces.  </p>

<p>Generally systems should accept all Wikipedia titles from the outside system,  but should use the official form when exporting data to the outside.</p>

<p>Wikipedia titles have the special property of being unique,  unlike Freebase titles,  which can be shared by many objects.  Wikipedia titles are disambiguated in a rather ad-hoc manner.  Sometimes Wikipedians choose names to avoid conflict,  but frequently they add something to the title to disambiguate it,  such as a few words in parenthesis giving the type of of the object,  for example</p>

<p><a href="http://en.wikipedia.org/wiki/The_Battle_of_Los_Angeles_(album)">Battle of Los Angeles (album)</a></p>

<p>Note that the <code>/{lang}_id/</code> namespace contains numeric identifiers,  which are the internal primary key in the database tables behind Wikipedia.  These identifiers are supposed to remain stable when titles change,  so they provide one more interconnection between Freebase and Wikipedia.</p>

<h2 id="keys-in-the-complete-edition">Keys in the complete edition</h2>

<p>The Compact Edition of :BaseKB contains only the <code>:type.object.key</code>  identifiers.  I believe these are sufficient for almost any task,  but the Complete Edition provides a different view of Freebase keys.  It so turns out that any Freebase namespace,  like</p>
<div class="highlight"><pre><span></span>/authority/iso/3166-1/alpha-2
</pre></div>
<p>can be converted to a URI</p>
<div class="highlight"><pre><span></span>&lt;http://rdf.freebase.com/key/key.authority.iso.3166-1.alpha-2&gt;
</pre></div>
<p>and Freebase uses this as a predicate like so</p>
<div class="highlight"><pre><span></span>?subject ?keyPredicate &quot;String_Value_Of_Key&quot;.
</pre></div>
<p>In this case,  <a href="http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2">ISO 3166-1 Alpha 2</a> is the fancy name for the commonly used two-letter country abbreviations,  and by searching this namespace,  we can make a list of current countries,  together with their codes and labels.</p>
<div class="highlight"><pre><span></span>prefix : &lt;http://rdf.basekb.com/ns/&gt;
prefix key: &lt;http://rdf.basekb.com/key/key.&gt;

select ?country ?code  ?label {
   ?country key:authority.iso.3166-1.alpha-2 ?code .
   ?country rdfs:label ?label .
   FILTER(lang(?label)=&#39;en&#39;)
}
</pre></div>
<p>the first few results look like</p>

<p><img alt="Country Codes" src="https://silvrback.s3.amazonaws.com/uploads/f971cc71-f918-4c50-a5b3-5ae60dbe8930/Country%20Kodes_large.PNG" /></p>

<h2 id="conclusion">Conclusion</h2>

<p>Freebase has a mechanism for representing internal or external identifiers that is expressed in two different ways.  When you learn how to use this mechanism,  you&#39;ll find it easy to link up Freebase with other data sources.</p>
]]></content:encoded>
      </item>
      <item>
        <guid>http://blog.databaseanimals.com/announcing-legal-entity-identifier-search#7709</guid>
          <pubDate>Mon, 25 Aug 2014 04:24:21 -1100</pubDate>
        <link>http://blog.databaseanimals.com/announcing-legal-entity-identifier-search</link>
        <title>Announcing Legal Entity Identifier Search</title>
        <description>Yet Another New Site</description>
        <content:encoded><![CDATA[<p>I&#39;d like to announce a <a href="https://legalentityidentifier.info/lei/lookup">new site for looking up Legal Entity Indentifiers</a>.</p>

<p><img alt="Silvrback blog image" src="https://silvrback.s3.amazonaws.com/uploads/3af73b05-c285-449a-82e0-ee556887318c/LEILookup_large.PNG" /></p>

<p>this is the first consumer-facing site I&#39;ve done in a while.  It&#39;s a simple site right now,  but I made an effort to make the site insanely fast by implementing the SPDY protocol and taking advantage of new developments in AWS that can give close to bare metal performance.</p>

<p>Legal Entity Identifiers are important because they are the new international standard for identifying companies that are parties to financial transactions.  My plan is to develop a site that is useful to people who need to look up LEIs in their day to day work and also use it is a testbed for concept matching technology that I&#39;m developing around <a href=":BaseKB.">http://basekb.com/</a></p>
]]></content:encoded>
      </item>
  </channel>
</rss>